FormalPara Key Summary Points

Why carry out this study?

Non-radiographic axial spondyloarthritis (nr-axSpA), a subtype of axial spondyloarthritis (axSpA), is characterized by a substantial burden of illness that is comparable to ankylosing spondylitis.

Biologic disease-modifying anti-rheumatic drugs (bDMARDs), including tumor necrosis factor inhibitors and interleukin inhibitors, are effective treatment options following the failure of non-steroidal anti-inflammatory drugs.

Certolizumab pegol (CZP) has been shown to be effective in the management of patients with nr-axSpA, however no head-to-head efficacy comparisons vs. other bDMARDs have been reported.

What was learned from this study ?

Results of the indirect treatment comparisons showed that patients treated with CZP were significantly more likely to achieve key efficacy responses compared to most bDMARDs such as etanercept, ixekizumab, and secukinumab.

Among patients displaying objective signs of inflammation, CZP was found to be superior to SEC (in the MRI−/CRP + and MRI + /CRP− subgroups) and ETN (MRI + /CRP− subgroup) and it was comparable to golimumab and ixekizumab across the different subgroups of patients with objective signs of inflammation.

Introduction

Axial spondyloarthritis (axSpA) is a complex inflammatory disease with musculoskeletal and extra-skeletal manifestations [1]. Patients typically present with chronic back pain of inflammatory origin, which is distinctive from mechanical back pain; however, half of the patients with axSpA will have a peripheral manifestation (peripheral arthritis, enthesitis, and dactylitis) [2] and one-third will present with an extra-musculoskeletal manifestation (acute anterior uveitis, inflammatory bowel disease, and psoriasis) [2, 3]. Non-radiographic axial spondyloarthritis (nr-axSpA) represents a subtype of axSpA that is characterized by the absence of radiographic structural damage in sacroiliac joints [1]. Although nr-axSpA encompasses earlier stages of axSpA, it has been shown to be associated with a substantial burden of illness that is comparable to the more advanced ankylosing spondylitis in relation to symptoms, work, disability, and health economic costs [4,5,6]; however, when disease activity is uncontrolled, it may lead a significant number of patients to progressive structural damage and bone ankylosis [7].

Currently available treatments for nr-axSpA include non-steroidal anti-inflammatory drugs (NSAID) and biologic disease-modifying anti-rheumatic drugs (bDMARDs) [8, 9]. Tumor necrosis factor inhibitors (TNFis) were the first biologic treatment approved for nr-axSpA and, more recently, interleukin (IL)-17a inhibitors were also shown to be effective [10]. However, TNFis remain the primary treatment of choice following failure of or intolerance to NSAIDs, as recommended by the 2022 Assessment of Spondyloarthritis International Society (ASAS)/European League Against Rheumatism (EULAR) and the 2019 American College of Rheumatology (ACR) guidelines [8].

Several clinical trials have demonstrated the efficacy of bDMARDs in improving the key clinical efficacy outcomes among patients with nr-axSpA [11,12,13,14,15,16,17]. One such bDMARD is certolizumab pegol (CZP), a PEGylated, Fc-free TNFi, which has shown a positive risk/benefit profile in adult patients with axSpA, with inadequate response to conventional therapy, and has subsequently been approved for the treatment of axSpA in Europe and North America [18,19,20]. The efficacy of CZP was initially demonstrated in the RAPID-axSpA trial, the first study to examine the efficacy of an anti-TNFi in the whole axSpA spectrum (i.e., ankylosing spondylitis and nr-axSpA) [21, 22]. Then, the C-axSpAnd Study, the first trial to incorporate a 52-week placebo (PBO)-controlled time period in patients with nr-axSpA and objective signs of inflammation (OSI), demonstrated that adding CZP to the nonbiologic background medication was better than PBO in improving disease activity, physical functioning, and pain among patients with active nr-axSpA [11]; this supported FDA approval of the only TNFi for nr-axSpA treatment. Nevertheless, both trials were placebo-controlled and did not include an active comparator arm. Thus, in the absence of direct comparisons of CZP vs. other bDMARDs, indirect treatment comparisons are necessary for informing decisions about treatment choices by clinicians and patients.

OSI currently plays a key role in the management of nr-axSpA. Treatment recommendations from the ASAS/EULAR suggest that patients start with bDMARDs early in the disease process when they have active disease, despite the use (or intolerance/contraindication) of at least two NSAIDs, and have OSI, evidenced by elevated C-reactive protein (CRP) and/or sacroiliitis on magnetic resonance imaging (MRI) [23]. In the context of OSI, it has been shown that positive MRI is the strongest independent predictor of better response to CZP in patients with nr-axSpA [24]. Furthermore, licensed indications of all bDMARDs are restricted to patients with OSI whose presence is also required by most reimbursement policies. To gain a more comprehensive understanding of the comparative efficacy of CZP within the context of the nr-axSpA treatment landscape, a systematic literature review (SLR) and indirect treatment comparison (ITC) of CZP and other bDMARDs (i.e., TNFi and IL inhibitors) were conducted in subgroups of matching nr-axSpA populations defined by prior exposure to bDMARDs, disease duration, baseline OSI status, and assessment timepoints. These results could provide an indirect efficacy comparison in the absence of head-to-head clinical trials, and highlight the need for future research examining use of bDMARDs across the different subgroups of patients (e.g., those with different baseline CRP levels and MRI status).

Methods

Systematic Literature Review

The procedures used in the conduct of the SLR followed the Cochrane Collaboration and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines [25, 26].

Literature Sources and Searches

Literature searches were conducted in MEDLINE, Embase, and the Cochrane Central Register of Controlled Trials (CENTRAL) to identify randomized trials in patients with clearly defined nr-axSpA who had failed at least one NSAID and were treated with selected bDMARDs (i.e., TNFi and IL inhibitors). The searches capture literature published since 1991 through October 2020. Studies reporting only on AS or axSpA that did not report an explicit nr-axSpA subgroup were excluded manually during abstract and full-text screening.

Conference abstracts of the ACR, EULAR, British Society for Rheumatology (BSR), and the International Society for Pharmacoeconomics and Outcomes Research (ISPOR; both the international and European meetings) from the previous year (2019–2020 for the review update performed in October 2020) for each search update were also manually reviewed.

The database and grey literature searches were supplemented by searches of the reference lists of recent (i.e., published since January 2019) systematic reviews, pooled analyses, and meta-analyses captured in the systematic search. These additional sources sought to fill any data gaps in the indexed published literature.

Study Selection Criteria

The population, intervention/comparators, outcomes and study design (PICOS) approach was used to define the inclusion and exclusion criteria. Randomized controlled trials (RCT) of adult patients with nr-axSpA who had failed at least one NSAID were eligible for inclusion. The publications must have examined adalimumab (ADA), CZP, etanercept (ETN), golimumab (GOL), infliximab (IFX), ixekizumab (IXE), or secukinumab (SEC) and reported on at least one of the following outcomes:

  • Proportion of patients achieving ASAS20 response (a composite clinical outcome defined as ≥ 20% improvement and absolute improvement from baseline of at least 1 unit [on 0–10 scale] in at least three of the four main ASAS domains [patient global, spinal pain, function, and inflammation] and no worsening [by ≥ 20% and 1 unit] in the remaining domain).

  • Proportion of patients achieving ASAS40 response (a composite clinical outcome defined as ≥ 40% improvement and absolute improvement from baseline of at least 2 units [on 0–10 scale] in at least three of the four main ASAS domains [patient global, spinal pain, function, and inflammation] and no worsening [by ≥ 20% and 1 unit] in the remaining domain).

  • Proportion of patients achieving AS Disease Activity Score-Inactive Disease (ASDAS-ID) state (a recommended measure for remission in AS, defined as an ASDAS score of < 1.3). ASDAS, a composite index used to assess disease activity, combines some patient-reported outcomes and acute phase reactants.

  • Proportion of patients achieving ASDAS-Major Improvement (ASDAS-MI) response (a clinical improvement outcome measure based on ASDAS defined as a decrease of at least 2 units relative to baseline). ASDAS, a composite index used to assess disease activity, combines some patient-reported outcomes and acute phase reactants.

  • Change from baseline in Bath Ankylosing Spondylitis Disease Activity Index (BASDAI; disease activity index with higher scores indicating greater disease activity).

  • Change from baseline in Bath Ankylosing Spondylitis Functional Index (BASFI; physical function index with higher scores indicating greater functional limitations).

  • Change from baseline in total spinal pain scores assessed using visual analogue scales (spinal pain score with higher scores indicating greater pain intensity).

Studies were not eligible for inclusion if they did not meet the PICOS criteria. Studies were excluded if they:

  • Population: did not enroll patients not diagnosed with nr-axSpA who had failed at least one NSAID;

  • Interventions/comparators: did not examine ADA, CZP, ETN, GOL, IFX, IXE, or SEC;

  • Outcomes: did not report on outcomes or timepoints of interest were also excluded; or,

  • Study design: were animal or in vitro studies, phase I studies, non-randomized trials including single-arm trials, quasi-experimental studies, observational studies, or SLRs/meta-analyses.

Screening and Extraction

Studies were identified for inclusion using a two-level screening process. First, two reviewers used the predefined inclusion and exclusion criteria to evaluate the titles and abstracts of the records captured in the searches. Then, they retrieved the full-text articles of any abstracts that passed the first level of screening and used the same criteria to evaluate these publications. Studies were only included in the review if they met all of the protocol-specified inclusion criteria and none of the exclusion criteria (Supplementary Material; Table S1). Any disagreements between the two reviewers were resolved by a third reviewer.

Standardized data extraction tables were developed using Microsoft Excel® and were used to capture evidence from each of the primary studies included in the current review. A single reviewer extracted the key data from each publication into the tables; data elements included study characteristics and patient characteristics (including demographic characteristics, disease duration, and prior treatment), treatment details, and outcomes of interest for each RCT. All entries were then reviewed by a senior researcher to ensure consistency and accuracy. The risk of bias associated with each included trial was assessed by one reviewer and validated by a senior reviewer using the Cochrane Collaboration’s tool [27]. Studies were evaluated in six domains including selection, performance, detection, attrition, reporting, and other bias, with an overall risk of bias judgement awarded to each study (i.e., high, low, or some concerns).

This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Indirect Treatment Comparison

Feasibility Assessment

A feasibility assessment was conducted to determine the optimal analytic approach to compare the efficacy of CZP vs. the other bDMARDs. We considered clinical heterogeneity across studies in order to satisfy the core ITC assumptions of transitivity and consistency, and also considered the statistical approach (e.g., network meta-analysis [NMA] or other type of ITC). We compared study designs, enrollment criteria, patient characteristics (including age, sex, race, disease/symptom duration, comorbidities, HLA-B27 status, baseline CRP levels, MRI of sacroiliac joints baseline status and concomitant and prior treatment), timepoints of interest, and efficacy outcome definitions of interest. Other considerations included data availability for overall nr-axSpA populations and subpopulations of interest (by prior bDMARD status and baseline OSI).

For all bDMARDs other than CZP, only one trial per treatment was identified; all trials were PBO-controlled, with no studies directly comparing any two or more bDMARDs. The feasibility assessment identified notable differences across trials of different bDMARDs with respect to measurement timepoints, patient populations, and baseline patient characteristics. Baseline patient characteristics, particularly disease duration, prior exposure to bDMARDs, and OSI status, which were identified a priori, were considered to be potential effect modifiers. This implied that between-trial variation on these variables might introduce bias into any NMA based on full intent-to-treat (ITT) data from all trials (i.e., by violating the ‘transitivity’ assumption). Thus, instead, a series of simple Bucher pairwise ITCs was pursued [28], each in matched subgroups of populations and timepoints.

Analysis Approach

Whenever data from both the CZP 200 mg every 2 weeks (Q2W) and 400 mg every 4 weeks (Q4W) arms were available, they were pooled (as these are considered equivalent with the same total dose) and an indirect comparison was performed of the pooled CZP data vs. the comparator. On the other hand, results of the different dosing strengths and schedules of IXE or SEC were not pooled as the total doses were not equivalent. Due to anticipated differences in the comparator trial populations (based on prior treatment and disease duration) and timepoints (12, 16, and 52 weeks), the base case ITCs were designed to reflect the ITT nr-axSpA populations of the included studies; these conducted using bespoke post-hoc subgroup analyses for CZP identified studies (UCB; data on file [29]) of the CZP trials (RAPID-axSpA and C-axSpAnd), with the optimal match on population and timepoint for each comparator trial (e.g., we used the outcome findings at 12 weeks in bDMARD-naïve nr-axSpA patients with symptom duration between > 3 months and < 5 years from C-axSpAnd and RAPID-axSpA trials to match those of the EMBARK trial [12]). Concerning the differences in exposure to prior bDMARD treatment, the subgroups of patients who were bDMARD-naïve were used for the ITC when possible. This reduced the risk of bias that may have been caused by varying study populations and time points across the different indirect comparisons.

In addition to the base case analyses, subgroup analyses based on baseline MRI and CRP status were conducted when feasible (based on data availability). These subgroup analyses were conducted on ASAS40, as this was the outcome most frequently reported for these subgroups. The CRP level-based and MRI results-based subgroup analyses from the comparator studies were compared vs. C-axSpAnd, as this study stratified randomization by baseline CRP level and MRI results. These were stratification factors for randomization in EMBARK (ETN), GO-AHEAD (GOL), COAST-X (IXE), and PREVENT (SEC), and analyses of these subgroups did not break randomization. Given the differences in laboratories and definitions of the upper limit of normal (ULN) for CRP level in comparisons across studies, CRP subgroups could not be replicated in C-axSpAnd based on the same cut-off value of CRP. Therefore, for comparisons, CRP levels of both 5 mg/l (> 5 mg/l and ≤ 5 mg/l) and 10 mg/l (> 10 mg/l and ≤ 10 mg/l) were used as cut-off points for subgroups for C-axSpAnd, in order to examine the impact of varying CRP thresholds.

Outcomes

The planned ITC analyses are presented in Table 1.

Table 1 Individual pairwise comparisons

The binary outcomes of interest with validated thresholds [30] included the following:

  • The proportion of patients achieving ASAS20

  • The proportion of patients achieving ASAS40

  • ASDAS-ID (ASDAS < 1.3)

  • ASDAS-MI (a change of ≥ 2 units, compared with baseline).

Continuous outcomes of interest included [30]:

  • Mean change from baseline in BASDAI

  • Mean change from baseline in BASFI

  • Total spinal pain score.

Statistical Methods

The Bucher approach is a method of conducting ITCs, which yields nearly identical results to a Bayesian NMA in a pairwise comparison (i.e., when only two active treatments are compared via a common comparator) [31]. Each Bucher ITC of CZP vs. the comparators of interest was conducted in two stages. First, where applicable, results from the two CZP trials were pooled to obtain the direct estimate of CZP vs. PBO for a given outcome and its standard error (SE) using classical (frequentist) meta-analysis. The random-effects (RE) model, which assumes that the true effects are randomly distributed around an average effect across all populations, was used as the primary analysis in the first stage to account for any unexplained heterogeneity in the trials’ populations and study estimates. On the other hand, a fixed-effects (FE) meta-analysis, which assumes that all included studies share a common effect size, was also run as a sensitivity analysis. When a single set of data (i.e., data from only one study or already pooled data) was available for a pair of treatments, such data served as the direct estimate (no further pooling was required) for this comparison. In the second stage, the direct estimates of CZP vs. PBO and each comparator vs. PBO were used to compute the Bucher ITC estimate of CZP vs. each comparator.

Findings were considered statistically significant if the 95% confidence interval (95% CI) did not include 0 (for the mean differences [MD]) or 1 (for odds ratios [OR]). The MDs of change were in favor of CZP when differences were < 0. The odds of achieving responses were in favor of CZP when ORs were > 1. Whenever possible, the quantitative analyses were performed using the intention-to-treat (ITT) or modified ITT study population (with ITT data used in preference to mITT if both were available). Meta-analyses were carried out using the metafor package in R 4.0.3 [32].

Results

Systematic Literature Review

SLR Search Results

The SLR searches yielded 2063 unique publications, of which 118 (reporting on ten unique trials) met the inclusion criteria for the qualitative synthesis. Three trials were not eligible for quantitative synthesis: (1) Haibel 2008 had a non-comparable nr-axSpA population with moderate-to-severe nr-axSpA [33]; (2) ABILITY-3 only enrolled patients who achieved sustained remission during the initial non-randomized phase [34]; (3) PREVAS, whose results were published in abstract form only at the time the systematic review was conducted, examined very early treatment in patients with suspected nr-axSpA [35]. This left a total of seven trials which met the inclusion criteria for the Bucher ITCs. The flow of included studies in the SLR and quantitative synthesis is summarized in Fig. 1.

Fig. 1
figure 1

PRISMA study attrition diagram for systematic literature review. AS ankylosing spondylitis, axSpA axial spondyloarthritis, CENTRAL Cochrane Central Register of Controlled Trials, DMARD disease-modifying anti-rheumatic drug, ITC indirect treatment comparison, NSAID non-steroidal anti-inflammatory drug, PD pharmacodynamics, PK pharmacokinetics, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, RCT randomized controlled trial, SLR systematic literature review

Study and Patient Characteristics

A summary of the study and patient characteristics of the included trials is presented in Table 2. All seven trials were double-blind, PBO-controlled, and multicenter [12,13,14,15,16, 36, 37]. Randomized phases of the studies ranged between 12 and 52 weeks. Sample sizes in the different treatment arms ranged between 40 and 186 patients. Although the trials had generally homogenous populations, with all seven using the ASAS classification criteria for axSpA [38], EMBARK [12] and GO-AHEAD [13] limited enrollment to patients with earlier disease/symptom duration than the other studies included in the ITC (mean time since symptom onset of 2.5– ≤ 5 years [12, 13]) and GO-AHEAD limited enrollment to patients with earlier disease/symptom duration than the other studies included in the ITC (time since symptom onset of 2.5– ≤ 5 years [12, 13] compared to 7.80–11.30 years [16, 36]). Four trials included only patients who were naïve to TNFis [12,13,14, 16], whereas C-axSpAnd, RAPID-axSpA [11, 21], and PREVENT [15, 36, 37] included some patients who were previously exposed to TNFis (5.70%, 14–20%, and 9.70%, respectively). The subgroups of patients who were TNF-naïve were used for the ITC when available. Sex distribution was comparable across studies (males: 43–64% [12, 14]), and mean age across the trials ranged from 30–41 years [13, 16]. Sex distribution was comparable across studies (males: 43–64% [12, 14]), and mean age across the trials ranged from 30–41 years [13, 16].

Table 2 Study characteristics of RCTs

All trials had a comparable proportion of patients with elevated CRP levels. However, the threshold for the ULN was not consistent across studies (> 5 mg/l, > 15 mg/l, or > 1 × ULN, or > 9 or 9.99 mg/l), as different studies used different central laboratories. In addition, EMBARK and PREVENT used high-sensitivity (hs) CRP levels, for which low values cannot be compared to CRP levels directly.

The mean baseline CRP levels of nr-axSpA patients across the trials reporting this data were as follows:

  • ABILITY-1: 6.80–7.20 mg/l

  • COAST-X: 12.10–12.30 mg/l

  • GO-AHEAD: 13–15 mg/l

  • C-axSpAnd: 16.10–17.90 mg/l

  • RAPID-axSpA (AS001): 13.40–19.30 mg/l

Note: The RAPID-axSpA trial evaluated both AS and nr-axSpA patients; only nr-axSpA patients from this trial are included for the purpose of this manuscript.

Risk-of-Bias Assessment

Overall, evaluated trials had a low risk of bias, though the risk was unclear in some instances, particularly for randomization methods and blinding of study outcome assessments (Supplementary Material; Table S2). While all evaluated trials were randomized, the specific methodology and type of randomization (e.g., simple, block, stratified) were often unclear. Studies were also evaluated for other sources of bias not covered in the predefined domains; all trials were designated as low risk for additional biases, except EMBARK, which inconsistently defined its primary outcome (ASAS20 or ASAS40) across sources [12, 39, 40].

Indirect Treatment Comparison

Statistical Heterogeneity

Opportunities to investigate statistical heterogeneity in the results were minimal, as the only comparison in the ITCs with more than one trial was CZP vs. PBO, which included the C-axSpAnd and RAPID-axSpA trials in some analyses. No notable or statistically significant heterogeneity was found in most cases (I2 = 0; tau = 0) or only minimal heterogeneity (I2 < 25%) was identified for most of the included outcomes. Nevertheless, modestly elevated heterogeneity was identified in ASDAS-MI among the bDMARD-naïve subgroup at 12 weeks (I2 of 32.20%).

Results for the Main Analyses

The key results from the ITCs of CZP vs. the different comparators are presented in Table 3, Fig. 2a and Supplementary Figs. 1–8. Statistically significant advantages for CZP were seen against each comparator treatment for at least one outcome; when differences were not statistically significant, a numerical advantage was seen for CZP compared to all other interventions across all endpoints.

Table 3 Summary of base-case ITC results (CZP vs. comparator bDMARDs at 12–16 weeks)
Fig. 2
figure 2

ASAS40 ITC results. *CZP Pooled (CZP 200 Q2W + CZP Q4W) is analyzed instead of CZP 200 mg. **Random-effect model (analyses are otherwise fixed-effect models). ADA adalimumab, ASAS Assessment in Ankylosing Spondylitis, CI confidence interval, CRP C-reactive protein, CZP certolizumab pegol, ETN etanercept, GOL golimumab, IXE ixekizumab, LD loading dose, MRI magnetic resonance imaging, NL non-loading dose, OR odds ratio, Q2W every other week, Q4W every 4 weeks, SEC secukinumab

CZP vs. ADA (number of studies included in the ITC, n = 3): At 12 weeks, statistically significant odds favoring CZP pooled over ADA 40 mg were estimated for BASDAI (with differences of mean change from baseline (95% CI) of − 0.94 [− 1.53, − 0.34]), BASFI (− 1.27 [− 1.99, − 0.55]) and total spinal pain (− 0.98 [− 1.66, − 0.29]) among patients who were naïve to prior bDMARD therapy. However, no significant differences were found between the two treatments for ASAS20/40 and ASDAS-ID/MI responses.

CZP vs. ETN (n = 2): At 12 weeks, CZP 200 mg showed statistically significantly higher odds of achieving ASAS20, ASAS40, and ASDAS-ID response compared with ETN 50 mg with ORs (95% CIs) of 4.78 (1.89, 12.09), 4.90 (1.65, 14.56), and 10.60 (1.65, 68.26), respectively, among patients who were naïve to prior bDMARD therapy with symptom duration between 3 months and 5 years. CZP 200 mg also demonstrated statistically significant improvement in BASDAI and BASFI scores in this population compared to ETN, with differences of mean change from baseline and corresponding 95% CIs of − 1.59 (− 2.63, − 0.55) and − 1.67 (− 2.53, − 0.81), respectively.

CZP vs. GOL (n = 2): At 16 weeks, CZP 200 mg had statistically significantly higher odds of achieving ASDAS-ID response than GOL (OR: 4.19; 95% CI: 1.09, 16.14) among patients who were naïve to prior bDMARD therapy with disease duration no more than 5 years. Results for ASAS20, ASAS40, BASDAI, BASFI, and total spine pain were not statistically significant.

CZP vs. IXE (n = 3): At 16 weeks, bDMARD-naïve patients who received CZP 200 mg were statistically significantly more likely than those receiving IXE Q2W and IXE Q4W to achieve response on the binary outcomes of ASAS20 (vs. Q2W: OR [95%CI] of 2.82 [1.39, 5.74], vs. Q4W: 3.15 [1.54, 6.43]), ASAS40 (vs. Q2W: 2.69 [1.09, 6.61], vs. Q4W: 3.30 [1.33, 8.18]), and ASDAS-ID (vs. Q2W: 3.96 [1.04, 14.91], vs. Q4W: 5.10 [1.34, 19.42]). CZP 200 mg also yielded statistically significant improvements in scores on the continuous outcomes of BASDAI (vs. Q2W: MD [95%CI] of − 1.09 [− 1.85, − 0.34], vs. Q4W: − 1.43 [− 2.19, − 0.68]), BASFI (vs. Q2W: − 1.02 [− 1.86, − 0.19], vs. Q4W: − 1.29 [− 2.13, − 0.46]), and total spinal pain (vs. Q2W: − 1.28 [− 2.10, − 0.46], vs. Q4W: − 1.52 [− 2.35, − 0.69]). At 52 weeks, CZP 200 mg maintained a statistically significant advantage in ASAS40 over IXE Q4W (OR: 2.50; 95% CI: 1.02, 6.13); however, this was not maintained vs. IXE Q2W. Results were also not significantly in favor of CZP 200 mg for improvements in BASDAI at 52 weeks.

CZP vs. SEC (n = 2): In comparison to both SEC non-loading dose (NL) and SEC loading dose (LD) at 16 weeks in patients with nr-axSpA regardless of prior treatment, there were statistically significant results favoring CZP pooled for ASAS20 (NL: OR [95%CI] of 3.37 [1.88, 6.01], LD: 3.56 [2.00, 6.36]), ASAS40 (NL: 3.76 [1.98, 7.12], LD: 3.88 [2.05, 7.35]), and BASFI (NL: − 1.53 [− 2.46, − 0.60], LD: − 1.38 [− 2.31, − 0.45]). Trends for ASAS40 were also similar for the bDMARD-naïve nr-axSpA population (NL: 4.35 [1.98, 9.56], LD: 4.48 [2.04, 9.85]) and the statistical significance was maintained through 52 weeks in both populations. The comparison for ASDAS-ID estimated statistically significant results favoring CZP over SEC LD at 16 weeks (2.62 [0.86, 8.01]) and not 52 weeks. No statistically significant results in favor of CZP were found for ASDAS-MI and BASDAI, and no statistically significant differences for ASDAS-ID were seen between CZP and SEC NL at 12 and 52 weeks, or for SEC LD at 52 weeks.

MRI/CRP Subgroup Results

All trials included in the SLR reported subgroup ASAS40 data by baseline MRI or CRP status except for RAPID-axSpA; however, the trials varied in terms of whether subgroup data were provided for MRI-positive or -negative and/or CRP-positive and -negative groups. A summary of the ASAS40 findings by baseline MRI/CRP status is presented in Table 4.

Table 4 Base case and subgroup ASAS 40 response

The baseline rate of elevated CRP in the CZP arm of C-axSpAnd trial was 52.60%. Across the comparator trials included in the subgroup ITC analyses, the baseline rates of elevated CRP ranged from 32% (ABILITY-1) to 58% (PREVENT). The baseline percentage of patients with sacroiliitis on MRI in the CZP and PBO arms of C-axSpAnd were 74.70 and 74.80%, respectively. In the comparator trials, the baseline levels of sacroiliitis on MRI ranged from 46% (ABILITY-1) to 82% (EMBARK).

The ITC results in these subgroups demonstrated that CZP was superior to SEC among patients who were MRI + /CRP- or MRI−/CRP + and ETN among patients who were MRI + /CRP− at improving ASAS40. CZP was comparable to GOL and IXE in improving ASAS40 across the different subgroups. The results for these analyses were nearly identical regardless of the cut-off used to define normal CRP levels (i.e., 5 mg/l or 10 mg/l).

MRI-/CRP + subgroup: Among the subgroup of patients who were MRI−/CRP + , CZP 200 mg was associated with statistically significantly higher odds of achieving ASAS40 compared with SEC 150 mg LD and NL at week 16, with ORs (95% CIs) of 4.56 (1.03, 20.21) and 5.29 (1.18, 23.60), respectively (Fig. 2c). In this subgroup, we did not find statistically significant differences between CZP treatment and other bDMARDs (ETN, GOL, and IXE) in achieving ASAS40.

MRI + /CRP- subgroup: Among the subgroup of patients who were MRI + /CRP−, CZP 200 mg was associated with statistically significantly higher odds of achieving ASAS40 compared with ETN 50 mg, SEC 150 mg LD or NL at week 16, with ORs (95% CIs) of 11.11 (1.92, 64.3), 4.96 (1.77, 13.88) and 3.95 (1.41, 11.03), respectively (Fig. 2d). Similar to the MRI−/CRP + subgroup, CZP treatment in patients with MRI + /CRP− was associated with comparable odds of achieving ASAS40 relative to GOL and IXE.

MRI + /CRP + subgroup: Among the subgroup of patients who were MRI + /CRP + , there were no statistically significant results favoring CZP 200 mg over any of the comparators at week 16 (Fig. 2b). Across the trials in this analysis, the proportion of bDMARD-treated patients with ASAS40 response was higher in the MRI + /CRP + subgroups compared with the ITT populations (Table 4).

Discussion

The SLR identified seven RCTs in nr-axSpA that were eligible for inclusion in the clinical efficacy ITCs. These trials examined ADA, ETN, GOL, CZP, IXE, and SEC. Results of the ITC showed that, at 12 to 16 weeks, patients who received CZP were statistically significantly more likely to achieve ASAS20 and ASAS40 responses compared to those receiving ETN, IXE (Q2W or Q4W), and SEC (with or without LD), but no differences were found compared to ADA or GOL. Also, patients receiving CZP were statistically significantly more likely to achieve ASAS40 at 52 weeks compared to IXE Q4W and SEC (with or without LD). Patients who received CZP were statistically significantly more likely to have an ASDAS-ID response over ETN, GOL, IXE (Q2W or Q4W), and SEC LD. However, no significant differences were found in achieving ASDAS-MI response. For continuous outcomes, CZP was more likely to improve BASDAI scores over ADA, ETN, and IXE (Q2W or Q4W), in BASFI scores over ADA, ETN, IXE (Q2W or Q4W), and SEC (with or without LD), and in total spinal pain over ADA, ETN, and IXE (Q2W or Q4W).

While the base-case analysis included patients with or without OSI, it should be noted that the subgroups of patients with OSI may be more likely to be treated in clinical practice. When examining the ITC results across the different OSI subgroups, the statistically significant benefit of CZP of achieving ASAS40 was maintained only vs. SEC (in the MRI + /CRP− and MRI−/CRP + subgroups) and ETN (in the MRI + /CRP− subgroup) but there were no statistically significant differences compared to GOL or IXE in any of the OSI subgroups. CZP generally performed better in populations with normal CRP (MRI + /CRP−) regardless of the specific cut-off threshold used for defining normal CRP (i.e., ≤ 5 mg/dl or ≤ 10 mg/dl), though results vs. GOL and IXE were not statistically significant. CZP was also shown to have similar efficacy to the other bDMARDs in the MRI + /CRP + subgroup, however ORs vs. the different comparators were lower than those seen in the base case; this could be potentially attributed to the increased likelihood of achieving ASAS40 responses among MRI + /CRP + . A previously published NMA examined the efficacy of bDMARDs in patients with nr-axSpA. Kiri et al. [41] presented a Bayesian NMA, whose results were published in abstract form, in bDMARD-naïve nr-axSpA examining ASAS40 in the same set of studies included in our ITC. The conclusions of this NMA for CZP vs. the other bDMARDs were consistent with our ITC results, with the point estimates being nearly identical for four out of the five comparators. For instance, the OR (95% credible interval) for CZP pooled vs. ADA at 12 weeks in the Bayesian NMA [41] was 2.14 (0.87, 5.18) compared to an OR (95% CI) of 2.13 (0.89, 5.13) obtained in the current Bucher ITC analysis. This study highlights the comparability of the Bucher ITC approach with the Bayesian NMA approach in a small network with only comparisons vs. PBO.

Our study had several strengths. We utilized a comprehensive search strategy to identify the most relevant literature that would capture the treatment landscape of nr-axSpA. Additionally, conducting individual pairwise comparisons (rather than an overall NMA) allowed the use of bespoke post-hoc subgroup analyses of the CZP trials to provide the best fit and match for individual comparator populations and timepoints. We also examined the differences across the different OSI subgroups. This ultimately generated comparisons that were the most appropriate for each individual comparator treatment. In contrast, ITCs or NMAs using the primary endpoints from all trials would be subject to substantial clinical heterogeneity, which may potentially impact the validity of the results, given potential effect modification by timepoint, prior bDMARD exposure, disease duration, and OSI.

Our SLR and ITC were limited by the variation across the identified studies with respect to how CRP levels were measured and the different ULNs. For example, COAST-X used a CRP threshold of 5 mg/l, EMBARK used a high-sensitivity CRP threshold of 3 mg/l, and GO-AHEAD used a CRP threshold above ULN (0.9 mg/dl). This variation in measurement and thresholds may result in differential inclusion of patients into the normal and high CRP subgroups. To address this heterogeneity, we analyzed CRP levels using thresholds of both 5 mg/l and 10 mg/l and found that the impact was minimal and did not affect the results. Additionally, PREVENT and EMBARK measured high-sensitivity CRP. This may particularly impact the interpretation of ASDAS endpoints, which incorporate CRP levels into the score. There is no available approach for converting the various CRP levels measured using different techniques (standard vs. high-sensitivity assays), limiting opportunities to align across studies. Therefore, findings of the ASDAS endpoints must be interpreted with caution. In addition, the pairwise ITCs were conducted in subgroups of patients that were matched by a limited set of potential effect modifiers identified a priori, including prior exposure to bDMARDs, disease duration, baseline OSI status, and assessment timepoints. While it would have been ideal to adjust for additional factors such as gender, there is no substantial evidence supporting the presence of an interaction effect between gender and treatment in patients with nr-axSpA [42]. Moreover, gender was generally balanced across the included trials with only slightly higher proportions of men in the EMBARK and GO-AHEAD trials that are unlikely to impact our ITC results. Finally, our study only evaluated efficacy outcomes and did not examine the comparative safety for CZP vs. other bDMARDs.

The subgroup analyses by OSI status are further limited by sample size, which decreased substantially compared to the base-case analysis; treatment arms ranged from 24 to 69 patients in the MRI + /CRP + subgroups, 22 to 80 patients in the MRI + /CRP− subgroups, and 7 to 52 patients in the MRI−/CRP + subgroups. This contrasts with 91 to 186 patients in the base-case analysis of ASAS40. Such small sample sizes may have undermined the statistical power to detect significant differences in the subgroups. Despite the variations in measurement and small sample sizes, clear trends in changes were seen vs. the base case with respect to baseline CRP levels in the subgroup analyses.

Randomized controlled trials are regarded as the gold standard for evaluating the relative efficacy between treatments of interest. However, in the absence of head-to-head comparisons, ITC techniques fill an important gap by generating the comparative estimates between treatments of interest. Nevertheless, ITCs are not void of certain limitations. Particularly, the distributions of the known and unknown treatment effects modifiers need to be balanced across trials included in the network to ensure the generation of unbiased ITC estimates. This is referred to as the transitivity assumption [31]. However, since we created more homogeneous subgroups by matching trials’ populations based on potential effects modifiers identified a priori in a series of ITCs, the transitivity assumption is expected to be met. In addition, the consistency of direct and indirect evidence is another assumption that needs to be met to ensure unbiased generation of ITC estimates; however, this was not applicable in our study because there were no head-to-head RCTs comparing CZP to any of the comparators. Lastly, for better interpretation and confidence in the ITC results, the estimates of effect for a comparison where data pooling from multiple sources is required need to be relatively homogeneous. In scenarios where data from the two CZP trials (RAPID-axSpA and C-axSpAnd) were pooled, the heterogeneity was low to modest for the different outcomes that were examined (maximum I2 ≤ 32.2%).

Conclusions

This SLR identified seven randomized trials of TNFis in nr-axSpA since 1991 through October 2020 that were appropriate for ITC analysis. The base-case ITC found that CZP was significantly better than other TNFis or IL-17 inhibitors for the majority of the clinical outcomes assessed. There were no instances where CZP was significantly less favorable than any of the bDMARDs for any of the outcomes assessed.

Among patients with OSI, CZP was found to be superior to SEC (in the MRI−/CRP + and MRI + /CRP− subgroups) and ETN (MRI + /CRP− subgroup) and it was comparable to GOL and IXE across the different OSI subgroups. Although limited by small sample sizes, these findings may indicate a need for further studies evaluating different patients’ baseline CRP levels.